Normative: Update String toLocale{Lower,Upper}Case to ResolveLocale with best-fit matching #956

gibson042 · 2025-01-30T15:41:05Z

Fixes #896

Do not ignore locales after the first in the list returned by CanonicalizeLocaleList(locales) (observable via e.g. "I".toLocaleLowerCase(["zzz", "tr"]) === "ı").
Match against the items of that list by best-fit rather than prefix, aligning with the rest of ECMA-402 (although the difference is not necessarily observable).
When the list is not empty but no matching locale is found, default to DefaultLocale() rather than "und", aligning with empty-list behavior and with the rest of ECMA-402 (observable via e.g. "I".toLocaleLowerCase("zzz") === "I".toLocaleLowerCase() regardless of default locale, just like new Intl.Collator("zzz", { sensitivity: "variant" }).compare("i", "ı") === new Intl.Collator(undefined, { sensitivity: "variant" }).compare("i", "ı")).

…ith best-fit matching Fixes tc39#896

anba · 2025-01-30T16:01:45Z

Do not ignore locales after the first in the list returned by CanonicalizeLocaleList(locales) (observable via e.g. "I".toLocaleLowerCase(["zzz", "tr"]) === "ı").

But also "I".toLocaleLowerCase(["en", "tr"]) === "ı", because "en" will generally don't have locale-sensitive case mappings, which means the next locale in the list gets selected.

Match against the items of that list by best-fit rather than prefix, aligning with the rest of ECMA-402 (although the difference is not necessarily observable).

"best fit" matching isn't supported in browsers. (V8 has "--harmony-intl-best-fit-matcher", but that's not available by default.)

When the list is not empty but no matching locale is found, default to DefaultLocale() rather than "und", aligning with empty-list behavior and with the rest of ECMA-402 [...]

That means "I".toLocaleLowerCase("und") can now return either "i" or "ı", depending on the user-locale.

gibson042 · 2025-01-30T16:33:15Z

Do not ignore locales after the first in the list returned by CanonicalizeLocaleList(locales) (observable via e.g. "I".toLocaleLowerCase(["zzz", "tr"]) === "ı").

But also "I".toLocaleLowerCase(["en", "tr"]) === "ı", because "en" will generally don't have locale-sensitive case mappings, which means the next locale in the list gets selected.

Ah yeah, I guess the Available Locales List needs to include more than just locale identifiers with language-sensitive case mappings. Any thoughts on what it should be? Mayble %Intl.Collator%.[[SortLocaleData]]?

Match against the items of that list by best-fit rather than prefix, aligning with the rest of ECMA-402 (although the difference is not necessarily observable).

"best fit" matching isn't supported in browsers. (V8 has "--harmony-intl-best-fit-matcher", but that's not available by default.)

That's irrelevant, because LookupMatchingLocaleByBestFit is defined to produce results "at least as good as those produced by the LookupMatchingLocaleByPrefix algorithm" (and therefore any implementation is free to just reuse LookupMatchingLocaleByPrefix).

When the list is not empty but no matching locale is found, default to DefaultLocale() rather than "und", aligning with empty-list behavior and with the rest of ECMA-402 [...]

That means "I".toLocaleLowerCase("und") can now return either "i" or "ı", depending on the user-locale.

I believe that would also be addressed via the Available Locales List provided to ResolveLocale, but regardless should align with other Intl services in general and Intl.Collator in particular. Probably, "und" should just always be considered available, but definitely should not be given special treatment exclusively in TransformCase.

anba · 2025-01-31T08:18:18Z

Ah yeah, I guess the Available Locales List needs to include more than just locale identifiers with language-sensitive case mappings. Any thoughts on what it should be? Mayble %Intl.Collator%.[[SortLocaleData]]?

If I had to guess, I'd say one of the reasons locale case conversion works differently from the other APIs, is that it's difficult to find an appropriate Available Locales list. I'm not sure if Intl.Collator is a good fit.

That's irrelevant, because LookupMatchingLocaleByBestFit is defined to produce results "at least as good as those produced by the LookupMatchingLocaleByPrefix algorithm" (and therefore any implementation is free to just reuse LookupMatchingLocaleByPrefix).

I had assumed "the difference is not necessarily observable" was in reference to actual browser behaviour. If we assume an implementation that supports "best fit", which most likely uses the data from https://github.com/unicode-org/cldr/blob/main/common/supplemental/languageInfo.xml, then it's possible to have observable differences. There are three relevant entries:

<languageMatch desired="ku"	supported="tr"	distance="30"	oneway="true"/>
<languageMatch desired="azb" supported="az" distance="10" oneway="true"/>
<languageMatch desired="az"	supported="ru"	distance="30"	oneway="true"/>

That means "I".toLocaleLowerCase("ku") may fallback to "I".toLocaleLowerCase("tr"), because there's the fallback "ku" → "tr". For example V8 doesn't ship locale data for Kurdish (Intl.Collator.supportedLocalesOf("ku") returns the empty array), so if V8 started to officially support the "best fit" matcher, but string conversion is tied to the Intl.Collator Availables Locales, then "I".toLocaleLowerCase("ku") could start to return the dot-less i (U+0131).

I believe that would also be addressed via the Available Locales List provided to ResolveLocale, but regardless should align with other Intl services in general and Intl.Collator in particular. Probably, "und" should just always be considered available, but definitely should not be given special treatment exclusively in TransformCase.

I gave "und" as a special case, because at least for programmers with a Java background, using "und" shouldn't be too uncommon. (Java's String case conversion methods use java.util.Locale.getDefault() by default, which can result in bugs when the default locale is Turkish/Azeri. Instead it's necessary to use str.toLowerCase(Locale.ROOT).)

…ocale with best-fit matching

gibson042 · 2025-02-18T17:10:43Z

Updated per TG2 discussion.

gibson042 added 2 commits January 30, 2025 10:17

Editorial: Make ResolveLocale parameter localeData optional

239f27c

Normative: Update String toLocale{Lower,Upper}Case to ResolveLocale w…

70accc6

…ith best-fit matching Fixes tc39#896

gibson042 added 2 commits February 18, 2025 12:09

fixup! Normative: Update String toLocale{Lower,Upper}Case to ResolveL…

389a885

…ocale with best-fit matching

Editorial: Add TransformCase note about efficiency

d1fb28f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normative: Update String toLocale{Lower,Upper}Case to ResolveLocale with best-fit matching #956

Normative: Update String toLocale{Lower,Upper}Case to ResolveLocale with best-fit matching #956

gibson042 commented Jan 30, 2025

anba commented Jan 30, 2025

gibson042 commented Jan 30, 2025

anba commented Jan 31, 2025

gibson042 commented Feb 18, 2025

Normative: Update String toLocale{Lower,Upper}Case to ResolveLocale with best-fit matching #956

Are you sure you want to change the base?

Normative: Update String toLocale{Lower,Upper}Case to ResolveLocale with best-fit matching #956

Conversation

gibson042 commented Jan 30, 2025

anba commented Jan 30, 2025

gibson042 commented Jan 30, 2025

anba commented Jan 31, 2025

gibson042 commented Feb 18, 2025